Bayes' rule

In probability theory and applications, Bayes' rule relates the odds of event A_1 to event A_2, before and after conditioning on event B. The relationship is expressed in terms of the Bayes factor, \Lambda. Bayes' rule is derived from and closely related to Bayes' theorem. Bayes' rule may be preferred over Bayes' theorem when the relative probability (that is, the odds) of two events matters, but the individual probabilities do not. This is because in Bayes' rule, P(B) is eliminated and need not be calculated (see Derivation). It is commonly used in science and engineering, notably for model selection.

Under the frequentist interpretation of probability, Bayes' rule is a general relationship between O(A_1:A_2) and O(A_1:A_2|B), for any events A_1, A_2 and B in the same event space. In this case, \Lambda represents the impact of the conditioning on the odds.

Under the Bayesian interpretation of probability, Bayes' rule relates the odds on probability models A_1 and A_2 before and after evidence B is observed. In this case, \Lambda represents the impact of the evidence on the odds. This is a form of Bayesian inference - the quantity O(A_1:A_2) is called the prior odds, and O(A_1:A_2|B) the posterior odds. By analogy to the prior and posterior probability terms in Bayes' theorem, Bayes' rule can be seen as Bayes' theorem in odds form. For more detail on the application of Bayes' rule under the Bayesian interpretation of probability, see Bayesian model selection.

Contents

The rule

Single event

Given events A_1, A_2 and B, Bayes' rule states that the conditional odds of A_1:A_2 given B are equal to the marginal odds of A_1:A_2 multiplied by the Bayes factor \Lambda:

O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,

where

\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)}.

In the special case that A_1 = A and A_2 = \neg A, this may be written as

O(A|B) = \Lambda(A|B) \cdot O(A) .

Multiple events

Bayes' rule may be conditioned on an arbitrary number of events. For two events B and C,

 O(A_1:A_2|B \cap C) = \Lambda(A_1:A_2|B \cap C) \cdot \Lambda(A_1:A_2|B) \cdot O(A_1:A_2) ,

where

\Lambda(A_1:A_2|B) = \frac{P(B|A_1)}{P(B|A_2)} ,
\Lambda(A_1:A_2|B \cap C) = \frac{P(C|A_1 \cap B)}{P(C|A_2 \cap B)} .

In this special case, the equivalent notation is

 O(A|B,C) = \Lambda(A|B \cap C) \cdot \Lambda(B|A) \cdot O(A).

Derivation

Consider two instances of Bayes' theorem:

P(A_1|B) = \frac{1}{P(B)} \cdot P(B|A_1) \cdot P(A_1),
P(A_2|B) = \frac{1}{P(B)} \cdot P(B|A_2) \cdot P(A_2).

Combining these gives

\frac{P(A_1|B)}{P(A_2|B)} = \frac{P(B|A_1)}{P(B|A_2)} \cdot \frac{P(A_1)}{P(A_2)}.

Now defining

O(A_1:A_2|B)  \triangleq \frac{P(A_1|B)}{P(A_2|B)}
O(A_1:A_2) \triangleq \frac{P(A_1)}{P(A_2)}
\Lambda(A_1:A_2|B) \triangleq  \frac{P(B|A_1)}{P(B|A_2)},

this implies

O(A_1:A_2|B) = \Lambda(A_1:A_2|B) \cdot O(A_1:A_2).

A similar derivation applies for conditioning on multiple events, using the appropriate extension of Bayes' theorem

Examples

Frequentist example

Consider the drug testing example in the article on Bayes' theorem.

The same results may be obtained using Bayes' rule. The prior odds on an individual being a drug-user are 199 to 1 against, as \textstyle 0.5%=\frac{1}{200} and \textstyle 99.5%=\frac{199}{200}. The Bayes factor when an individual tests positive is \textstyle \frac{0.99}{0.01} = 99:1 in favour of being a drug-user: this is the ratio of the probability of a drug-user testing positive, to the probability of a non-drug user testing positive. The posterior odds on being a drug user are therefore \textstyle 1 \times 99�: 199 \times 1 = 99:199, which is very close to \textstyle 100:200 = 1:2. In round numbers, only one in three of those testing positive are actually drug-users.

Model selection

External links